Method and system for reassembling fragmented datagrams utilizing a plurality of concurrently accessible reassembly queues

ABSTRACT

A method, system and program product for reassembling fragmented datagrams is described. A plurality of fragments of a plurality of datagrams are received by a recipient data processing system. In response to receipt of the plurality of fragments, a plurality of processes concurrently access a reassembly data structure to store the plurality of fragments, such that the plurality of datagrams are incrementally reassembled from the plurality of fragments. In one embodiment, the reassembly data structure can be implemented as a list containing a plurality of reassembly queues that each contain one or more queue entries for reassembling a respective datagram. Data integrity of the reassembly data structure can be maintained by associating a respective one of a plurality of locks with each of the plurality of reassembly queues so that only one process at a time can access each reassembly queue.

BACKGROUND OF THE INVENTION

[0001] 1. Technical Field

[0002] The present invention relates in general to data processing and,in particular, to data processing system communication. Still moreparticularly, the present invention relates to a data processing system,method and program product for reassembling fragmented datagrams.

[0003] 2. Description of the Related Art

[0004] The Internet can generally be defined as a worldwide collectionof heterogeneous communication networks and associated gateways, bridgesand routers that all employ the TCP/IP (Transport ControlProtocol/Internet Protocol) suite of protocols to communicate datapackets between a source and one or more destinations. As is well knownto those skilled in the art, the TCP/IP suite of protocols correspondsto layers 3 and 4 (the network and transport layers, respectively) ofthe seven-layer International Organization for Standardization OpenSystems Interconnection (ISO/OSI) reference model, which provides aconvenient framework for discussing communication protocols. The ISO/OSIreference model further includes physical and link layers (layers 1 and2, respectively) below the network and transport layers, and session,presentation, and application layers (layers 5 through 7, respectively)above the network and transport layers.

[0005] In communicating TCP/IP datagrams between devices over theInternet (or other networks), the maximum transmission unit (MTU) sizeof the various interfaces through which datagrams are communicated maydiffer. Accordingly, during output of a datagram, the sending IP layerchecks if a data gram can be sent unfragmented. If so, the sending IPlayer outputs the datagram through its interface unfragmented. However,if the sending EP layer determines the datagram cannot be transmittedunfragmented because the datagram size exceeds the interface's MTU, thesending IP layer disassembles the datagram into fragments smaller thanits interface's MTU and outputs the fragments. During transmission,these fragments may be subject to further fragmentation by routers alongthe path to the recipient.

[0006] When the fragments of the datagram are received by the intendedrecipient, the receiving IP layer at the recipient must compile theoriginal datagram from the received fragments. Because the recipientdoes not necessarily receive the fragments sequentially and may receiveduplicate fragments, the receiving IP layer needs some mechanism tobuffer received fragments and reassemble them to form a datagram. In theprior art, this mechanism is implemented as a single reassembly queuefor all of the IP layer, as described in Chapter 10 of Stevens, TCP/IPIllustrated Volume 2, which is incorporated herein by reference asbackground material.

[0007] The present invention recognizes that the use of a singlereassembly queue by all of the IP layer undesirably limits communicationperformance. For example, in a symmetric multiprocessor (SMP) computersystem, such as those commonly employed as e-commerce servers and thelike, a large number of processes may be receiving numerous fragmentsbelonging to different datagrams. Thus, many processes may desire toaccess the reassembly queue at the same time. However, to ensure thedata integrity of the single reassembly queue under these conditions,access by the processes to the single reassembly queue is serialized bya lock that can be owned by only one process at a time. Contention forownership of the lock can therefore severely degrade performance,particularly under high traffic conditions.

SUMMARY OF THE INVENTION

[0008] The present invention overcomes the foregoing and additionalshortcomings in the art by providing an improved data processing system,method and program product for reassembling fragmented datagramsutilizing multiple reassembly queues that can be accessed in parallel bymultiple processes.

[0009] In accordance with the present invention, a plurality offragments of a plurality of datagrams are received by a recipient dataprocessing system. In response to receipt of the plurality of fragments,a plurality of processes concurrently access a reassembly data structureto store the plurality of fragments, such that the plurality ofdatagrams are incrementally reassembled from the plurality of fragments.In one embodiment, the reassembly data structure can be implemented as alist containing a plurality of reassembly queues that each contain oneor more queue entries for reassembling a respective datagram. Dataintegrity of the reassembly data structure can be maintained byassociating a respective one of a plurality of locks with each of theplurality of reassembly queues so that only one process at a time canaccess each reassembly queue.

[0010] All objects, features, and advantages of the present inventionwill become apparent in the following detailed written description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The novel features believed characteristic of the invention areset forth in the appended claims. The invention itself however, as wellas a preferred mode of use, further objects and advantages thereof, willbest be understood by reference to the following detailed description ofan illustrative embodiment when read in conjunction with theaccompanying drawings, wherein:

[0012]FIG. 1 depicts an illustrative embodiment of a data processingsystem with which the present invention may advantageously be utilized;

[0013]FIG. 2 illustrates a linked list data structure for paralleldatagram reassembly by multiple processes in accordance with a preferredembodiment of the present invention; and

[0014]FIG. 3 is a high level logical flowchart of a method ofreassembling fragmented datagrams in accordance with a preferredembodiment of the present invention.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENT

[0015] With reference now to the figures and in particular withreference to FIG. 1, there is depicted a high level block diagram of anexemplary data processing system 8 in accordance with the presentinvention. In this illustrated embodiment, data processing system 8 is aserver computer system that is coupled to a network 22 for communicationwith a second data processing system 6, which may function, for example,as a client. Communication between data processing systems 6 and 8 canemploy any of a number of known or future protocols that supportfragmentation of datagrams, including without limitation TCP/IP, UDP(User Datagram Protocol) over IP, WAP (Wireless Application Protocol),or Bluetooth. As discussed above, datagram fragmentation may benecessary because one or more interfaces within network 22 traversed bythe communication may have an MTU that is smaller than the datagrams tobe transmitted.

[0016] As illustrated, data processing system 8 includes a number ofprocessors 10 a-10 m, which each have a processor core 12 containingregisters, instruction flow logic and execution units utilized toexecute program instructions and an on-chip cache hierarchy 14 thatstages data and instructions to the associated processor core 12 fromsystem memory 18. Processors 10 access system memory 18 via a processorbus 16 and a memory controller 17.

[0017] Processor bus 16 is further coupled, via mezzanine bus bridge 26,to a mezzanine bus 30, which may be implemented as a PeripheralComponent Interconnect (PCI) local bus, for example. Mezzanine busbridge 26 provides both a low latency path through which processors 10may directly access I/O devices 32 and storage devices 34 that aremapped to bus memory and/or I/O address spaces and a high bandwidth paththrough which I/O devices 32 and storage devices 34 may access systemmemory 18. I/O devices 32 may include, for example, a display device,input devices, and serial and parallel ports. Storage devices 34, on theother hand, may include optical or magnetic disks that providenon-volatile storage for operating system and application software. Anetwork interface card 20 is also coupled to mezzanine bus 30 to supportcommunication with network 22.

[0018] In operation, data processing system 8 operates under the controlof a multitasking operating system (OS) 40, such as AIX (AdvancedInteractive eXecutive), which is at least partially resident withinsystem memory 18. OS 40 supports the concurrent execution of multipleprocesses by processors 10 a-10 m, and one of the functions of theprocesses of OS 40 is to fragment and reassemble datagrams communicatedbetween data processing systems 6 and 8 via network 22. To reassemblefragmented datagrams, OS 40 creates and maintains a data structure,referred to herein as a reassembly data structure (RDS) 40 having anassociated lock table (LT) 44 from which each process must obtain a lockprior to accessing RDS 40.

[0019] Referring now to FIG. 2, there is illustrated a more detaileddepiction of exemplary embodiment of reassembly data structure (RDS) 42and lock table (LT) 44 in accordance with the present invention. Asillustrated, in contrast to prior art datagram reassembly techniques,which utilize only a single reassembly queue that can be accessed byonly one process at a time, the present invention employs a RDS 42including multiple (in this case N) reassembly queues 50 that can beaccessed concurrently by multiple processes.

[0020] Although not required for all embodiments of the presentinvention, in the illustrated embodiment, the N reassembly queues 50 areorganized into a list, in which each reassembly queue 50 contains one ormore queue entries 52 for reassembling respective datagrams. The systemmemory address of the top of each reassembly queue 50 is specified by ahead pointer 56 that points to the first location 54 in the first queueentry 52 of that reassembly queue 50. Each queue entry 52 is constructedas a doubly-linked list of storage locations 54 for storing datagramfragments. In FIG. 2, locations 54 containing fragments are illustratedwith shading, and empty locations 54 are illustrated without shading.

[0021] To ensure the data integrity of RDS 42, each reassembly queue 50has an associated lock 58 within lock table 44. In order for a processto modify a reassembly queue 50 (e.g., by inserting a fragment or bydeallocating a reassembled datagram from a queue entry 52), the processmust gain ownership of the lock 58 associated with the reassembly queue50. Thus, the present invention permits up to N processes toconcurrently access RDS 42, rather than only a single process as in theprior art.

[0022] With reference now to FIG. 3, there is illustrated a high levellogical flowchart illustrating method by which a process ofcommunication software (e.g., OS 40) reassembles fragmented datagrams inaccordance with a preferred embodiment of the present invention. As willbe appreciated by reference to the foregoing, the embodiment of theinvention shown in FIG. 2 permits up to N of such processes toconcurrently access RDS 42.

[0023] As shown in FIG. 3, the process begins at block 70 and thereafterproceeds to block 72. The process iterates at block 72 until the processreceives at least a fragment of a datagram to process. In response toreceipt of at least a fragment of a datagram, the process proceeds toblock 74, which depicts a determination of whether or not a completedatagram or only a datagram fragment has been received. Thedetermination depicted at block 74, may be made, for example, bycomparing a length of the received data with the value of a length fieldin the header or by checking a fragmentation field (e.g., the ip-mffield of an IP header) in the header of the received data. In responseto a determination at block 74 that a complete datagram has beenreceived, no access to RDS 42 is required, and the process passes toblock 84, which is described below. However, if a determination is madeat block 74 that only a fragment of a datagram has been received, theprocess proceeds to block 76 and following blocks.

[0024] Block 76 illustrates the process selecting a reassembly queue 50within RDS 52 in which the received fragment will be combined with otherfragment to reassemble a datagram. In a preferred embodiment, thereassembly queue 50 is selected by hashing the ip_id (i.e., the datagramID) appearing the fragment's header, for example, utilizing a hashfunction such as MOD (i.e., the remainder function). Assuming thatreceived datagram fragments have well distributed IDs, a simple hashfunction like MOD tends to hash fragments to different reassemblyqueues, thereby minimizing contention over the associated locks 58 in LT44. Thus, the process should experience minimal, if any, contention whenobtaining the lock 58 within LT 44 associated with the selectedreassembly queue 50, as shown at block 77. Next, at block 78, theprocess identifies the appropriate queue entry 52 within the selectedreassembly queue 50 by comparing the ip_id of the datagram fragment withthe ip_id's of other fragments already stored in the selected reassemblyqueue 50. Of course, if no matches are found in the selected reassemblyqueue 50, the fragment is the first received fragment from a newdatagram, and a new queue entry is accordingly allocated to thedatagram.

[0025] As shown at block 80, the received fragment is then inserted inthe appropriate location 54 in the identified queue entry 52, forexample, by reference to the value of the ip_offset field in thefragment's header. In this manner, the first fragment is loaded into thefirst location 54, the second fragment is loaded into the secondlocation 54, etc., regardless of the chronological order in which thefragments are received. A determination is made at block 82 whether ornot the received fragment completes the fragmented datagram beingreassembled in RDS 42. If not, the process returns to block 72, whichhas been described. If, however, the process determines at block 82 thatreassembly of the datagram is complete, the process shown in FIG. 3proceeds to block 84, which illustrates the process sending thereassembled datagram to the next higher layer protocol (e.g., networklayer to transport layer) for processing. The process then deallocatesthe queue entry 52 allocated to the reassembled datagram, releases anylock 58 obtained at block 77, and terminates processing at block 86.

[0026] As has been described, the present invention provides an improvedmethod, system, and program product for reassembling fragmenteddatagrams. In accordance with the present invention, the datagrams arereassembled in a reassembly data structure that permits multipleprocesses to concurrently accesses a respective plurality of multiplereassembly queues. In this manner, contention for access to thereassembly queues is greatly reduced, and communication performance isincreased, particularly for protocols such as UDP that tend to havehighly fragmented datagrams.

[0027] While the invention has been particularly shown and describedwith reference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.For example, although aspects of the present invention have beendescribed with respect to a computer system executing software thatdirects the functions of the present invention, it should be understoodthat present invention may alternatively be implemented as a programproduct for use with a data processing system. Programs defining thefunctions of the present invention can be delivered to a data processingsystem via a variety of signal-bearing media, which include, withoutlimitation, non-rewritable storage media (e.g., CD-ROM), rewritablestorage media (e.g., a floppy diskette or hard disk drive), andcommunication media, such as digital and analog networks. It should beunderstood, therefore, that such signal-bearing media, when carrying orencoding computer readable instructions that direct the functions of thepresent invention, represent alternative embodiments of the presentinvention.

What is claimed is:
 1. A method for reassembling fragmented datagrams,said method comprising: receiving a plurality of fragments of aplurality of datagrams; and a plurality of processes concurrentlyaccessing a reassembly data structure to store the plurality offragments, such that the plurality of datagrams are incrementallyreassembled from the plurality of fragments.
 2. The method of claim 1,wherein the reassembly data structure includes a plurality of reassemblyqueues that each contains one or more entries for reassemblingdatagrams, and wherein concurrently accessing comprises concurrentlyaccessing different ones of said plurality of reassembly queues.
 3. Themethod of claim 2, and further comprising each of said plurality ofprocesses selecting a respective reassembly to access by hashing adatagram identifier within a respective one of said plurality offragments.
 4. The method of claim 2, and further comprising each of saidplurality of processes obtaining a respective one of a plurality oflocks prior to accessing said reassembly data structure, wherein each ofsaid plurality of locks is associated with a respective one of saidplurality of reassembly queues.
 5. The method of claim 1, and furthercomprising each of said plurality of processes obtaining a respectiveone of a plurality of locks prior to accessing said reassembly datastructure.
 6. The method of claim 1, and further comprising: in responseto completing reassembly of a datagram, passing the reassembled datagramto a higher protocol layer and deallocating the reassembled datagramfrom the reassembly data structure.
 7. The method of claim 1, wherein aplurality of processes concurrently accessing said reassembly datastructure comprises a plurality of operating system processesconcurrently accessing said reassembly data structure.
 8. The method ofclaim 1, wherein receiving a plurality of fragments of a plurality ofdatagrams comprises receiving a plurality of fragments of InternetProtocol (IP) datagrams.
 9. A data processing system, comprising:processing resources; a memory coupled to the processing resources, saidmemory containing: a reassembly data structure; and communicationsoftware executable by the processing resources as a plurality ofprocesses, wherein the plurality of processes, responsive to receipt atthe data processing system of a plurality of fragments of a plurality ofdatagrams, concurrently access a reassembly data structure to store theplurality of fragments, such that the plurality of datagrams areincrementally reassembled from the plurality of fragments.
 10. The dataprocessing system of claim 9, wherein the reassembly data structureincludes a plurality of reassembly queues that each contains one or moreentries for reassembling datagrams, and wherein the plurality ofprocesses concurrently access different ones of said plurality ofreassembly queues.
 11. The data processing system of claim 10, whereineach of said plurality of processes selects a respective reassembly toaccess by hashing a datagram identifier within a respective one of saidplurality of fragments.
 12. The data processing system of claim 10, saidmemory further comprising a lock table having a plurality of locks eachassociated with a respective one of said plurality of reassembly queues,wherein each of said plurality of processes obtains a respective one ofthe plurality of locks prior to accessing said reassembly datastructure.
 13. The data processing system of claim 9, wherein saidplurality of processes, responsive to completing reassembly of adatagram, pass the reassembled datagram to a higher protocol layer anddeallocate the reassembled datagram from the reassembly data structure.14. The data processing system of claim 9, wherein the plurality ofprocesses comprise operating system processes.
 15. The data processingsystem of claim 9, wherein the plurality of fragments comprise fragmentsof Internet Protocol (IP) datagrams.
 16. A program product, comprising:a computer-usable medium; within said computer-usable medium,communication software executable as a plurality of processes, whereinthe plurality of processes, responsive to receipt at a data processingsystem of a plurality of fragments of a plurality of datagrams,concurrently access a reassembly data structure to store the pluralityof fragments, such that the plurality of datagrams are incrementallyreassembled from the plurality of fragments.
 17. The program product ofclaim 16, wherein the reassembly data structure includes a plurality ofreassembly queues that each contains one or more entries forreassembling datagrams, and wherein the plurality of processesconcurrently access different ones of said plurality of reassemblyqueues.
 18. The program product of claim 17, wherein each of saidplurality of processes selects a respective reassembly to access byhashing a datagram identifier within a respective one of said pluralityof fragments.
 19. The program product of claim 17, wherein thereassembly data structure has an associated lock table having aplurality of locks each associated with a respective one of saidplurality of reassembly queues, wherein each of said plurality ofprocesses obtains a respective one of the plurality of locks prior toaccessing said reassembly data structure.
 20. The program product ofclaim 16, wherein said plurality of processes, responsive to completingreassembly of a datagram, pass the reassembled datagram to a higherprotocol layer and deallocate the reassembled datagram from thereassembly data structure.
 21. The program product of claim 16, whereinthe communication software forms a portion of an operating system, andwherein the plurality of processes comprise operating system processes.